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Abstract 

In this paper we consider nonparametric estimation for dependent data, where the 
observations do not necessarily come from a Hnear process. We study density estimation 
and also discuss associated problems in nonparametric regression using the 2-mixing 
dependence measure. We compare the results under 2-mixing with those derived under 
the assumption that the process is linear. 

In the context of panel time series where one observes data from several individuals, 
it is often too strong to assume the joint linearity of processes. Instead the methods 
developed in this paper enable us to quantify the dependence through 2-mixing which 
allows for nonlinearity. We propose an estimator of the panel mean function and obtain 
its rate of convergence. We show that under certain conditions the rate of convergence 
can be improved by allowing the number of individuals in the panel to increase with 
time. 
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1 Introduction 

Nonparametric estimation for dependent observations has a long history in statistics. Rosen- 
blatt [1970] first studied density estimation for dependent data. Since then several authors 
have considered nonparametric estimation under various assumptions. For example. Hall 
and Hart [1990a], Giraitis et al. [1996], Mielniczuk [1997] and Estevas and Vieu [2003] 
consider density estimation for linear processes which have long memory, whereas Cheng 
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and Robinson [1991] consider density estimation for random variables which are nonlinear 
functions of a linear process. A notable result, is that they show if the observations were 
from a linear process and have short memory, then the usual rate of convergence, known 
for independent observations, also holds for dependent observations. On the other hand, 
for long memory processes, the rate of convergence is different. Interestingly, despite long 
memory influencing the rate of convergence, there is no influence of long memory on the 
bandwidth choice, which is same regardless of short or long memory. In other words, if the 
observations come from a linear process, a larger bandwidth does not improve the rate of 
convergence of the density estimator. Similar results can also be derived for nonparametric 
regression problems (c.f. Hall and Hart [1990b], Cheng and Robinson [1994] and Csorgo 
and Mielniczuk [1995, 1999, 2001]). However, usually it is assumed that the observations 
come from a linear process or are functions of a linear process. In the case of linearity, 
the joint density of the observations can be characterised (in some sense) in terms of the 
autocovariances. It is this representation that allows for the mean squared error of the 
nonparametric estimator to be derived in terms of the autocovariance function. However 
this result does not necessarily hold when the process is nonlinear. 

The assumption of linearity can be relaxed by using the notion of 2-mixing (see Bosq 
[1998]), and in this paper we obtain rates of convergence for processes which are 2-mixing. 
Unlike the autocovariance function, 2-mixing can be considered as a measure of dependence 
between two random variables (see Definition 3.1, below) and the 2-mixing size quantifies 
this dependence: a large mixing size indicates little dependence, whereas a small mixing size 
indicates large dependence. The 2-mixing size can be established for several types of pro- 
cesses, for example, linear processes, see Athreya and Pantula [1986], Cline and Pu [1999], 
Chanda [1974] and the Appendix A. 4 (noting that strong mixing implies 2-mixing, though 
the converse is not necessarily true) and nonlinear processes, see Masry and Tj0stheim 
[1995], Bousamma [1998] and Basrak et al. [2002]. Assuming that the 2-mixing size is 
sufficiently large, Bosq [1998] obtains the rate of convergence of several nonparametric esti- 
mators. However despite, there being extensive literature on nonparametric estimation for 
linear processes and some on nonparametric estimation for processes which are 2-mixing 
with a sufficiently large 2-mixing size, as far as we are aware very little exists on nonpara- 
metric estimation for nonlinear processes whose 2-mixing size is not sufficiently large (for 
example the ARCH(cxd) process, which is a nonlinear process and can have a small mix- 
ing size). In this paper we address this issue, and consider nonparametric estimation for 
dependent data and formulate the results in terms of the 2-mixing size. We study both 
density estimation and also nonparametric regression problems. A natural application of 
the methodology proposed in this paper is to panel time series, where one observes several 
individuals over time and associated with each individual are regressors which are known 
to influence the individual. We note that even in the case that an individual comes from 
a linear time series, there is no guarantee that the dependence between individuals is also 
linear. Therefore we quantify the dependence in terms of the 2-mixing size within and 
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between individuals over time, and consider nonpar ametric estimation for panel time series 
within this framework. 

In Section 3 we consider kernel density estimation, in particular we obtain the sampling 
properties of the Rosenblatt-Parzen kernel estimator and obtain a bound for the mean 
squared error under the assumption that the time series are stationary and 2-mixing. We 
show that, like the long memory process, the 2-mixing size can influence the rate of con- 
vergence. But unlike the long memory process, a much larger 2-mixing size is required to 
obtain the usual rate of convergence. Moreover, the optimal bandwidths for the bounds 
obtained are influenced by the 2-mixing size - the smaller the 2-mixing size the larger the 
bandwidth. We demonstrate that several problems could arise if one were to falsely suppose 
that observations were from a linear process, when they do not. For example, if the usual 
optimal bandwidth for linear processes were used on nonlinear processes, the mean squared 
error may no longer converge to zero. Thus our results give a warning to practitioners who 
apply well known results for the linear process, without checking whether the process is 
linear or not. 

In Section 4 we consider nonparametric regression for dependent data. We discuss this 
with reference to two models. First we suppose the response and explanatory variables 
(Xf, Zt) satisfy (i) Xt = v?(^t) + h{Zt)r]t, where {r]t} and {Zt} are independent of each 
other, and secondly we assume the conditional expectation satisfies (ii) K(Xt\Zt) = ip{Zt). 
We observe that the latter model includes the former model as a special case. We estimate 
(p{-) using the classical kernel estimator and derive rates of convergence similar to those 
obtained for the density estimator. But in the case of model (i) the rate of convergence 
depends on two factors, the 2-mixing size of {Zt} and the rate of decay of the autocovariance 
function of whereas for model (ii) the rate of convergence is determined by the mixing 
size of the multivariate random process {[Xt, Zt)}. 

Panel time series are often used to model the relationship and dynamics between sev- 
eral individuals observed over time, and recently Hjellvik et al. [2004] and Mammen et al. 
[2005] have used nonparametric methods in this context. Typically it is assumed that the 
dependence between individuals is linear, however this assumption is often too strong, as 
there could be nonlinear interactions between the individuals. In Section 5 we consider es- 
timation for nonparametric panel time series, but allow for nonlinear dependence between 
individuals by quantifying their dependence through their 2-mixing sizes. Let Xt^i denote 
the observation of the ith individual at time t, where i = 1, . . . ,N and t = 1, . . . , T. We also 
assume that we observe some explanatory variables Zt^i which influence Xt^i. We suppose 
the influence is common over all individuals, that is the response and explanatory variables 
{Xt^i,Zt^i) satisfy the relation K[Xt^i\Zt^i = z] = ip{z) for all i S N and t £ Z. We propose 
a kernel based estimator for </?(•) and derive bounds for the deviation. In panel data it is 
often observed that there is temporal dependence for each individual and also dependence 
between individuals. We model this by assuming two different 2-mixing sizes. We show 
that the rate of convergence of the estimator of f when the number of individuals N is kept 



3 



fixed and T oo, is similar, to the rate of convergence of the nonparametric estimator of 
model (ii) considered in Section 4. However, we show that the rates can be improved if 
we allow to increase with T. Furthermore, if the mixing size is sufficiently large we can 
obtain the usual nonparametric rate of convergence obtained for iid random variables. 

All the proofs can be found in the appendix. Also some 2-mixing inequalities for linear 
processes used here are included in the appendix. 



2 Notation 

In this section we introduce some definitions that will be used in the paper. Note we will 
assume all the necessary densities exist. 

We start by defining the multiplicative kernel. 

Definition 2.1. For all w = {wi, . . . ,Wd) G M'^, K is a multiplicative kernel (see Scott 
[1992]) of order r, i.e. K{w) = Ilf^i£{wi) where £ is a univariate, even function such that 

J du i{u) = 1, j du u'i{u) = 

for all i = 1, . . . , r — 1 and there exists a constant Sk such that 

[ J du \u\''£{u)]'^ = Sk. 

Let Kb{z) := h~'^K{z/h), where 6 > is a bandwidth. Below we define the smoothness 
class which we use to bound the bias of the estimators. 

Definition 2.2. For s, A > 0, the space (5^^ is the class of functions g : R*^ ^ R satis- 
fying: g is everywhere {m — l)-times partially differentiahle for m — 1 < s ^ m; where for 
some p > and for all x, the inequality 

sup ^A, 

y:\y-x\<p W ^1 

holds true with (5 = when m = 1 and for m > \, Q is an {m — l)th-degree homogeneous 
polynomial in y — x, whose coefficients are the partial derivatives of g of orders 1 to m — 1 
evaluated at x; and A is a finite constant. 

For brevity, we use the standard notation A to denote minimum and V to denote maxi- 
mum. 



3 Kernel density estimation 

Suppose we observe the stationary time series {Zi, . . . , Zt}, and let / denote the density 
of Zt. The most popular estimator of /, is the Rosenblatt-Parzen kernel estimator 

1 ^ 

f{u) = -Y,Kb{Zt-u), (3.1) 

t=i 
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where Kf,{z) = b~^K{j^), 6 > is a bandwidth, and K is a multiphcative kernel (see 
Definition 2.1). In this section we investigate the samphng properties of the kernel density 
estimator defined above. The dependence of the process {Zt} is quantified in terms of its 
2-mixing size. 

Definition 3.1. (i) A process is said to be 2-mixing with size D if for all t ^ t 
sup \P{Af^B) - P{A)P{B)\<C\t-T\-''. 

Aea{Yt),Bea{Y^) 

for some C < oo independent of t and r. 

(a) The covariance of a stationary process {Yt} has size u if for all t ^ t, \cov(Yt,Yr)\ < 
C\t — r|~" for some C < oo independent of t and r. 

We note that the covariance is a measure of linear dependence, whereas 2-mixing is a 
generalisation of this, and can be considered as a measure of dependence. 2-mixing is quite a 
general notion, which is satisfied by several processes. For example, under certain conditions 
on the innovations, most linear models are 2-mixing (see Appendix A. 4, and Athreya and 
Pantula [1986], Cline and Pu [1999] and Chanda [1974], where strong mixing is shown). Fur- 
ther, under additional conditions on the innovations and the parameters, ARCH/GARCH 
processes are also strongly mixing (c.f. Masry and Tj0stheim [1995], Bousamma [1998] and 
Basrak et al. [2002]) which implies that they also 2-mixing. Most of the results and bounds 
in this paper are derived using 2-mixing. In general, the larger the mixing size the faster 
the rate of convergence. For example, in the case of iid observations (the 2-mixing size can 
be treated as oo) using just a few observations, information over the entire domain of the 
density function can be obtained. On the other hand, a sample which has a small mixing 
size (so tends to be clustered about certain points) will require a much larger number of 
observations to give the same information. 

We first derive a bound for the mean squared error (MSE) K\f{z) — f{z)\'^ using only 
minimal assumptions on the distribution of {Zt}. 

Proposition 3.1. Suppose the stationary process {Zt} is 2-mixing with size D and the 
marginal density f of Zt and its second derivative f" are both uniformly bounded. Let f be 
defined as in (3.1), where K is a rectangular kernel, i.e., K{x) = 1 if x £ [—1/2,1/2] and 
zero otherwise. Then we have 



E\f{z) - /(z)|2 = 0(6^ + r-["^i]6- 



-[(.vi)+il , ( Oib-^ + T-^b-'-^) o>l 
\ Oib^ + T-%-^) o<l 



Proof. To prove the result we will bound the risk using the standard variance bias de- 
composition. First the bias: as we are using a rectangular kernel and /" is uniformly 
bounded, it is clear that ¥,{f{z)) = f{z) -\- 0{b'^). To obtain a bound for the variance 
we require a bound for the covariances inside the variance expansion T"^ ■ var(/(z)) = 
^ cciv[Ki,{Zt — z),Kij(Zt- — z)]. Since {Zt} is 2-mixing with size ti by using the covariance 
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inequality in Bradley [1996] (see also Rio [1993]) we have 



\cow[Ki,{Zt- z),Ki,{Zr- z)]\ 

/"OO f'OO 

<4- / / mm {C\t-T\-\P{\Ki,{Zt-z)\>x),P{\Ki,[Zr-z)\>y))dxdy. 
Jo Jo 

(3.2) 

Studying P{\Kh{Zt — z)\ > x) and recalling that K{-) is a rectangular kernel we can show 
that 



P{\Kb{Zt - z)\ > x) 



0, ifx>l/6; 
P{Zt e[z- 6/2, z + 6/2]), otherwise. 



By using the mean value theorem we have P{Xt G [z — 6/2, z + 6/2]) = 6/(z), for some 
z ^[z — 6/2, z + 6/2]. Substituting this into (3.2) leads to 

fi/b fi/b 

\cov[Kb{Zt- z),Kb{Zr- z)]\<4- / mm {C\t - T\-\b ■ f{z))dxdy 

Jo Jo 

= 4-6-2 min {C\t - r|-^ 6 • f{z)) . (3.3) 

Altogether this yields the bound 

• var(/>)) < 4 ^ 6-2 min {C\t - r|-^ 6 • f{z)) . 

t,T 

Examining the minimum inside the summand above, we partition the sum into two parts 
which we bound separately (for the details see the proof of Theorem 3.2, in the Appendix). 
Finally recalling that [E(/(2;)) — f{z)\^ = 0(6"^) leads to the desired result. □ 

We observe, in the proof above, that besides the 2-mixing condition we do not have any 
assumptions on the joint distribution of [Zt, Zr). The cost of using such weak assumptions 
is that the usual bound 0(6"^ + (6r)-^) for the MSE, obtained for independent observations, 
is not achieved. Even for large D the 2-mixing size has an influence on the bound. However, 
introducing some assumptions on the joint densities of {Zt} allows us to tighten the bound 
derived in (3.3) and, hence for a sufficiently large mixing size d to recover the usual bound 
0(6^ + {bT)~^) for the MSE (we note that the rest of the proofs in this section and the 
subsequent sections require more subtle arguments, and these can be found in the appendix). 

Assumption 3.1 (Densities and kernels). (i) The marginal density f is uniformly bounded. 

(ii) For each t,T €z Z let Z*-*'^-' denote the joint density of {Zt,Zr). Define^ ^C*-"^) — 
_ y y_ Then \\F^^'^^\\pp is uniformly bounded in t for some pp > 2 and we 
define qp = I — 2/pp. 

(Hi) The univariate kernel K is uniformly bounded and has a finite first and second mo- 
ment, i.e., \\K\\i < OO and \\K\\2 < oo. 



'We use the notation f (g> g{x,y) = f{x)g{y) and = (/ \f{x)\''dxy^''. 
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We use these assumptions to derive a bound for the MSE of the density estimator. 

Theorem 3.2. Let us suppose the stationary time series {Zt} is 2-mixing with size D and 
Assumption 3.1 is fulfilled for some qp G (0, 1). 

Let f he defined as in (3.1), where K is a univariate kernel of order r. In addition 
assume that f £ for some A, s > (see Definition 2.2), and let p = r As. Then we 
have for all z gM 



For ease of presentation we have only stated the result for univariate {Zt}, however it is 
straightforward to extend this result for multivariate {Zt}. Indeed, the proof of the theorem 
given in the Appendix is derived for random vector {Zt} (as we require the multivariate 
case in Section 4). 

Remark 3.1. We note that in the bound given in Theorem 3.2 the second term dominates 
the third term when d > 1 + 1 /qp- Conversely, when D < 1 + 1 /qp the third term dominates 
the second term. Moreover, the third term can be partitioned into two further cases, when 
l<'o<l + l/qF and when D < 1. This means that Theorem 3.2 can be written as 



Studying the three bounds, we see that the bound increases linearly with d for < d < 1, 
after this point there is a change in behaviour and the increase is more gradual. The bound 
plateaux when d > 1 + l/qp, after this point we have the usual nonparametric bound 
obtained for iid observations. There is also a continuity in the three bounds. More precisely, 
when d is at the boundary of 1 and 1 + qp^, there is a continuous transition between the 
bounds. □ 

We now consider the rate of convergence using the optimal bandwidth b* . 
Corollary 3.3. Suppose the assumptions in Theorem 3.2 are satisfied and r > s. Let 



oo. 




as T ^ oo. 



b* ^ r-T/(2*+i) with 




d > 1 + l/qp] 
1 + l/qp > D. 



(3.4) 



Then for allzeR we have E\f{z) - f{z)\'^ = ofr^^''^ 



as T 



oo. 
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In other words, if 6* ~ T then we have 



2s 



0[T-^^j, v> l + l/qp; 

E\f{z)-f{z)\'^:={ o('t"^'^2s+{2+,+V-»)))^, l + l/qF>V>l; . (3.5) 

o(t^°'^A, 1 > D. 



We note that if sup^, \ f{z)\ < oo and sup^sup^ \ f^'^'^\z)\ < oo (both the density and 
the joint densities are uniformly bounded), then uniformly in all t, ||oo < oo- This 

means qp = 1, and the bound can be divided into the three cases where d<l, 1<D<2 
and d > 2. On the other hand when ||pj^ < oo for only a finite pp, then qp < 1 and 

V > 1 + qp^ > 2 to be sure of the usual nonparametric bound. 

Referring to Corollary 3.3, we observe that when < 1+qp^, then the optimal bandwidth 

b* is much larger than usual optimal bandwidths encountered in nonparametric regression 

-1 

(6 ~ T2s+i). We discuss this further in Section 3.2. 

3.1 A comparison of the MSEs for linear processes 

In this section we compare the MSE in Theorem 3.2 with the results obtained under the 
stronger condition that the observations {Zt} come from a linear process. We will use the 
results in Appendix A. 4 and show that if the process were linear, and not just mixing, that 
then the rate of convergence is better than the rate obtained in Corollary 3.3. However, 
in Section 3.2 we demonstrate that by misspecifying the process to be linear, can lead to 
several problems with the density estimator, including bounds which do not converge to 
zero. 

Let us suppose {Zt} has a linear process representation and satisfies 

oo 

Zt = '^ajet-j, (3.6) 

j=0 

where the innovations {et} are iid. Under the assumptions in Lemma A. 11 (see the Ap- 
pendix), it can be shown that cov{Kb{Zo), Kb{Zt)) = O {cov {Zq, Zt)). Using this as the 
basis. Hall and Hart [1990a], Giraitis et al. [1996], Mielniczuk [1997] and Estevas and Vieu 
[2003]) have shown that if the kernel is of order r > s (where r is the order of the kernel 
and s is the smoothness of /, see Definitions 2.1 and 2.2), the MSE is 

E|/(z) - f{z)\' = O (ft^^ + ^ + ^Rt^ , (3.7) 

where Rp = Ylt=i |cov(Zo, Z()|. It is clear that both cov(Zo,Zt) and Rp depend on the 
rate of decay of the parameters aj. We observe if \aj\ < Cj^^, then 

|cov(Zo,Zt)| = 0(r-2^+i) and Rp = 0((log r)r-(2e-i)+i) if 1/2 < < 1 
|cov(Zo,Zt)| = 0(r-^) and i?T = 0(T-^+i) if 9 > 1. 
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Substituting these rates into (3.7) we see that the bound of the MSE depends on 9. We recall 
that a process {Zt} is called a short memory process if |cov(Zo, Zt)\ < oo, otherwise it 
is called a long memory process. Now studying (3.7) we see that Rt does not depend on 
the bandwidth b. In other words long memory has no influence on the choice of the optimal 
bandwidth. To summarise, the rate of convergence for observations coming from a linear 
process is 



0(T'(2e-i))^ if 26 -1< 



2s 



E\f{z)-fiz)\^<{ ' -2s " (3.8) 

We now compare these results to (3.5), in particular when d < 1, we have 

E|/>)-/(z)|2 = 0(r-"+5lb). (3.9) 

It is difficult to directly compare (3.8) and (3.9), since (3.8) is in terms of its long memory 
parameter whereas (3.9) is in terms of its mixing size ti. However in the special case that 
{Zt} is Gaussian (and thus linear), there is a one-to-one correspondence, for example, if 
20 — 1 < 1 then the covariance size and mixing size are the same, and = (29 — 1). Noting 
that the Gaussian density is analytic, the rate of convergence is determined by the order of 
the kernel r. In this case, the rates in (3.8) are better than those in (3.9), but as the order 
r increases the two rates become close. We illustrate the case when the mixing and the 
covariance sizes are the same in Figure 1 (for both large and small rAs). In the non-Gaussian 
case, where the 2-mixing and covariance size do not necessarily coincide, d 7^ {26 — 1), we 
have that {29 — 1) 2(^^i) < f < (2^ — 1) ^^^2) ' where the innovations satisfy E(|eo|^) < 00 
(see (A. 43) in the appendix). In this case it is not clear which rate (3.8) or (3.9) is better. 
However substituting the lower bound > {29 — 1) 2(e+i) Corollary 3.3 yields a rate 
which is less than (3.8). In summary better rates of convergence can often be obtained 
if the observations come from a linear process. On the other hand, 2-mixing is a weaker 
condition, that is satisfied by a far wider class of processes. We consider below the MSE 
for processes which are not linear, and show that misspecifying the model, and assuming 
linearity, when the process is nonlinear could severely affect the MSE. 

3.2 The MSE for nonlinear processes 

As far as we are aware, theory is required to bridge the gap for processes which are nonlinear 
but have a small mixing size. One of the main aims of Theorem 3.2 is to fill in the gap in 
the theory, and to derive a bound for the MSE when the observations come from nonlinear 
processes with small 2-mixing size. 

The joint densities of processes which are nonlinear do not necessarily satisfy the density 
decomposition in Lemma A. 11. Without this result it cannot be shown that 
cov {Ki,{Zq), Kh{Zt)) = O {cov {Zq, Zt)), and the rates in (3.8) do not necessarily hold. In- 
stead, to prove the results, under Assumption 3.1, we use classical mixing inequalities to 
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Figure 1: The top and bottom plot corresponds to p = (r A s) = 1 and p = (r A s) = 5, 
respectively. The x-axis is the covariance and mixing size (assuming both are the same) 
and the y-axis is the indice S in the MSE E|/(x) — = 0{T^^). The solid line is the 

MSE using 2-mixing and dotted line is the MSE when {Zt}t is a linear process. We have 
assumed that qp = 1 (in other words H-F^'''*'' ||oo < oo). 

tighten the bound given in (3.3) (see the proof of Proposition 3.1). More precisely, to prove 
Theorem 3.2 we show that 

\cov{Kb{Zt - z), Kb{Zr - z))\ < C-6-^Tiin(^|t-r|-°,6(^+'?^)) , 

where C is a finite constant (see Lemma A.l, for the proof). 

Looking at some of the implications of Theorem 3.2, we demonstrate below that several 
problems could arise if one were to falsely suppose that the observations come from a linear 
process, when they do not. 

(i) In the case of linear processes, the optimal bandwidth has the same order as the 
optimal bandwidth for iid random variables (regardless of long memory) . The same is 
not necessarily true when all that is known is that the process is 2-mixing. Moreover, 
if the mixing size satisfies d < 1 and the bandwidth is such that ft^T" < cxd, then 
we see from Theorem 3.2 that the bound does not converge to zero. An important 
example, is when the 'usual' optimal bandwidth for linear or iid data is used (that 
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_ 1 _ 1 

is 6 ~ T In this case, substituting b ^ T 2a+i i^to Theorem 3.2 leads to the 

result 

> 1 + l/qp 
1 < d < 1 + l/qp 
< < 1 

Studying the rates above we see when D < 1 + l/qp, the rates are lower than the 
rates given using the optimal bandwidth (compare the above with the rates in Corol- 
lary 3.3). Moreover, in the case that d < 2^^, the bound cannot be used to show 
consistency of the estimator - since the bound does not even converge to zero. 

In short, to estimate the density at any given point, the number of observations 
(approximately bT) needs to be much larger than in the iid case. 

(ii) Rather surprisingly even when 2Jj=i |cov(Zo,Zj)| < cxd, the 'usual 0{T 2s+i)' rate, 
may not hold, unlike for linear processes. However, the usual rate does hold when 
d > 1 + I/qf > 2. Therefore, even when the mixing and covariance size are the 

2s 

same, a far larger mixing size may be require to obtain the 'usual 0{T 2^+1 y i-^te of 
convergence. 

Our results give a cautionary warning to practitioners who apply the optimal band- 
widths for linear processes to nonlinear process. In the subsequent sections, where we 
consider nonpar ametric regression problems, the assumptions and proofs will be more in- 
volved, however the underlying message is the same. That is, more than just the second 
order autocovariance function may have influence on the rate of convergence, and the rate 
of convergence can be severely compromised if the usual bandwidths were used. 

Remark 3.2 (Example). It is almost impossible to estimate the 2-mixing size from the 
observations, in contrast to long memory (c.f. Geweke and Porter-Hudak [1983], Kiinsch 
[1987] and Robinson [1995]). However to conclude this section we give an example of a 
nonlinear process whose 2-mixing size is less than 1-1-5, for some 5 > 0. Let us consider 
the ARCH(oo) process (see Robinson [1991]), where {Zt} satisfies 

CXD 

Zt = otEt at = «o + ^ ajZf_j, 

i=i 

with E(et) = (estimation of ARCH(oo) parameters is considered in Subba Rao [2006]). 
Giraitis et al. [2000] have shown that if for large t, at ~ (for some 5 < 0) and 

[E(e^)]^/2^°^iaj < 1, then \cov{Z^,Zf)\ ^ That is, the absolute sum of the 

covariances is finite, but 'only just' if 6 is small. Furthermore, if we assume that \et\ < 1, 
then it is straightforward to show that Zt is a bounded random variable. This means that 
using the mixing inequality for bounded random variables (see Hall and Heyde [1980]) we 



nhz)-fiz)\' 



l + qpil-V)-2s 

2-ti{2s+l) 
0(T 2S+1 ) 
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can show 

\cow{Zlzj)\ < C sup \P{AnB) - P{A)P{B)\, 

A&a{Zo),B&a{Zt) 

for some C < oo. Altogether this implies that an upper bound for the 2-mixing size of the 
ARCH(oo) process with at w t-^^+^\ is < (1 + 6). 

In other words the 2-mixing size for some ARCH(oo) process is small, and far from the 
geometric rate often assumed in nonparametric estimation. A lower bound for the 2-mixing 
size can be found in Subba Rao [2007]. □ 



4 Nonparametric regression 

In this section we consider nonparametric regression, with random design, where the obser- 
vations are dependent. It is worth mentioning that there has been extensive research done 
on nonparametric regression with fixed design and dependent errors (c.f. Hall and Hart 
[1990b], Csorgo and Mielniczuk [1995], and the references therein). In this case typically, 
one observes Yt, where If = (p{^) +et and {£t}t are stationary random variables with vary- 
ing degrees of dependence. It has been shown that the rate of convergence depends on the 
covariance of {st}t, in particular their absolute sum, Ylt^i |cov(eo, £*)]• 

In the random design model, one observes the stationary (1 -|-(i)-dimensional vector time 
series {{Xt, Zt)}t, where 

Xt = ip{Zt)+et (4.1) 

with K(Xt\Zt = z) = (p{z) and et = Xt — E{Xt\Zt). The randomness in this model is 
determined by two factors: the design {Zt} and the errors {et = Xt — E,{Xt\Zt)}. Therefore, 
unlike the fixed design model, the rate of convergence of any estimator of if must depend 
on the sampling properties of the design density estimator. Thus, it is clear that similar 
results to those in Section 3 should also apply to an estimator of tp. 

We now define the classical Nadaraya- Watson estimator of ip{-) and study its sampling 
properties, under various assumptions on {{Xt, Zt)}. Let p{x,z) be the joint density of 
{Xt, Zt). The estimator is 

^(-) = 14' (4-2) 
f{z) 

where g{z) := ^ Ylt=i XtKh{Zt — z) and f{z) := ^ Ylt=i ^bi^t — z) are estimators of 
g{z) = J xp{x,z)dx and f{z), which is the density of Zt. 

We first consider the sampling properties for a particular class of models which satisfy 
(4.1). Suppose the vector time series {{Xt, Zt)} satisfies the representation 

Xt = ip{Zt) + h{Zt)7]t (4.3) 
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for some h :M.'^ ^ M+, where the time series {Zt} and {-qt} are independent of each other. 
This class of models is similar to the fixed design model Xt = ^p{j^) + ??<, but in (4.3) the 
design is random and the conditional variance var(Xf|Zt) = h{Zt)'^\ai{r]t) , depends on the 
design. This model arises in various applications and we consider one application in Remark 
4.2. We will show in the theorem below that the rate of convergence depends both on the 
mixing size of the design {Zt}, but also on the size of the covariances of the process {r/t} 
(which we denote by u, see Definition 3.1). 
We require the following assumptions. 

Assumption 4.1 (Densities, moments and kernels). (i) For some p > 2 the functions 
h? ■ f and \ip\^ ■ f are uniformly bounded and we define q := 1 — 2/p. 

(ii) Let /(*'^) and F^*'^) he defined as in Assumption 3.1 (ii), 

g^''^\zi,Z2) := E[XtXr\Zt = Zi,Zr = Z2] ■ (^1, Z2). 

and G^*'"^^ := ^r^*'"^) — g ® g. Then H-F"^*'^^ and HG'-*''^-* ||pg are uniformly bounded 
in t and r for some pf-,Pg > 2. We define qp := 1 — 2/pp, qc ■= 1 — 2/pG and 
qPG ■= qp A qc- 

(Hi) The multiplicative kernel K has finite first and p-th moment. 

Studying Assumption 4.1(i), we see that it allows for various types of growth of the 
regression function (/? and the conditional variance h. The type of growth depends on 
the rate the density / decays to zero. For example, if / were the Gaussian density, then 
exponential growth of (p and h is possible. However, as we shall demonstrate in the theorem 
below, the larger the p, such that sup^. h{xY ■ f{x) < oo and sup^ |(/?(a;)|P • f{x) < oo, then 
the faster the rate of convergence of \ip{z) — ip{z)\'^. 

Theorem 4.1. Suppose the stationary time series {{Xt,Zt)} satisfies (4-3), {Zt} is 2- 
mixing with size d and the autocovariance of the time series {r]t} has size u. Suppose 
Assumption 4-1 is fulfilled for some q, qpG £ (0, 1). 

Let the estimator (p{z) he defined as in (4-2), where K is a multiplicative kernel of order 
r. In addition assume that Lp ■ f, f ^ ®f a /^'^ some A, s > 0, f is bounded away from zero 
and let p = r A s. Then we have for all z G M'^ 

\p{z)-p{z)\^ = Op(^52p^^-d.^-(uAl)^^-d{l+q+g^G(l-[fal,)Vl]).2--[(gD)Al]^^ T ^ OO. 

Remark 4.1. We observe that the bound obtained in Theorem 4.1 are similar to the bound 
derived for the density estimator in Theorem 3.2, where 

E\f{z) - /(z)|2 = oib'^P + b-'^ ■ + 6-rf{2-<?F(i-[t>vi])) . 2--[»Ai]^^ (4_4) 

noting that the result above is for arbitrary dimension d. The difference is the inclusion of 
the covariance size u of the errors and the q which 'balances' the tails of 1/p and / (see 
Assumption 4.1(i)). However, we observe that we can partition the bound in Theorem 4.1 
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into three cases, which are similar to the three cases considered in Remark 3.1. Most notably, 
we observe if u > 1 and o > l/qpG + 1/9 then we obtain the usual bound 0{b^^ + b~'^ ■ T~^) 
for the MSE. 

It is interesting to note that in the case ■ f and \ip\^ ■ f are uniformly bounded for all 
p, then q = 1 (eg. h and (p are bounded functions and / is exponential density). In this 
case the bounds given in (4.4) and Theorem 4.1 are quite similar. The main difference is 
the appearance of qfg rather than qp and, the term which replaces b~'^T~^. □ 

Corollary 4.2. Suppose the assumptions of Theorem 4-1 are satisfied. Let b* ~ 7"^7/(2p+d) 
with 

_ J min(u, 1), go > 1 + l/qpG] 

^ \ min(u, [(gd) A 1] • ^p+rfd+.+g^fi-K.^jvij)) ) ^ ^ + V^FG > Q^- ' 

then we have \(f{z) - (/?(z)P = OpIt^"^"^^ for all z € M. 

Let us now compare Theorem 4.1 with the bound obtained for the deterministic design 
Xt = ^{j^) + where u is the covariance size of the errors. In the case of the fixed design, 
the bound for the deviation of the kernel estimator is 0(6^''+T~*^"^^^6~'^) (c.f. Hall and Hart 
[1990b]). We see that the bound in Theorem 4.1 include this term, but also the additional 
term 0(6-''(^+'?+''^G(i-[(qo)vi]) .2^-[(<?d)ai]^^ which is the influence of the design, in particular, 

d. If the mixing size of the design were sufficiently large, then the fixed design and random 
design estimators have the same rate of convergence, 0{T 2p+d). 

Remark 4.2 (Example). Examples of processes which satisfy (4.3) are stochastic volatil- 
ity models (c.f Linton and Mammen [2004]), where one observes {Yt}, which satisfies the 
representation 

Yt = a{Zt)r]t. 

Here {r]t} are iid random variables, IE(^/t ) = 1 and {Zt} are explanatory variables which 
can include past values of Yf. Usually in finance the object is to estimate the conditional 
volatility o"^. By noting that Y^^ can be written as 

y/ = a(Zi)2 + (r?2-l)a(Z,)^ 

we see that 1^^ satisfies (4.3) with Xt = Y-^, et = {rjl — 1) and h[-) = ct(-)^. Therefore we 
can estimate the volatility (t(-)^ using (4.2), where (7(-)^, is the kernel estimator of o'(-)^. 
Furthermore, Theorem 4.1 can be applied to obtain the rate of convergence. More precisely, 
let D be the mixing size of {Zt}, and noting that cov{(r7^ — 1), {rjl — 1)} = 0, when t ^ s, 
which implies u = oo, we obtain 

\a{zf - a{zf\'^ = Op(b^P + 6-'^(i+9+9fg(i-[(<?o)vi]) . ^-[(9t,)Ai]y □ 
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From Corollary 4.2 we see that there are two factors which affect the rate of convergence: 
the mixing size d of the random design {Zt} and the size u of the covariance function of 
{rjt}. There are however several models of interest, which do not satisfy condition (4.3). In 
this case Theorem 4.1 cannot be applied and it is of interest to investigate what happens 
in the general case. 

Examples of models which do not necessarily satisfy (4.3) include the Cheng-Robinson 
model, where {Xt} satisfies the representation Xt = F{Ut)+G{Ut, Yt) with E{G{Ut, Yt)\Ut) = 
and {Yt} is a long memory process, which is independent of the weakly dependent design 
random variables {Ut} (c.f. Cheng and Robinson [1994], Csorgo and Mielniczuk (1999, 
2001)). However, the results are derived under the assumption that {Yt} comes from a 
linear process and (?(•) has a particular form. 

An alternative approach is developed in Bosq [1998], who considers nonparametric pre- 
diction for time series, where one observes the stationary time series {{Xt,Zt)} and the 
parameter of interest is (p{z) = E,(Xt\Zt = z). The sampling results in Bosq [1998] are 
based on the assumption that the mixing size of {(Xt,Zt)} is sufficiently large, (thus ex- 
cluding Cheng-Robinson type models) yielding an estimate which has the same rate as the 
kernel estimator for iid random variables. 

We now consider the sampling properties of (p, when the observations {{Xt, Zt)} satisfy 
the general model defined in (4.1), and dependence is quantified through its 2-mixing size, 
which can be arbitrary. 

We will use the following assumptions. 

Assumption 4.2 (Densities, moments and kernels). (i) Let 'K\Xt\^ < oo for some p > 
2 and define g^P\z) := E[|Xt|P|Zt = z\ ■ f{z). Then the functions g^^^ and f are 
uniformly bounded and we define q := 1 — 2/p. 

(ii) Let /(*'^) and F^*'^) he defined as in Assumption 3.1 (ii) and let g*-*'^-* and G*-*'^-* 
be defined as in Assumption 3.1 (ii). Then \\F'^^''^'^\\pp arid HG'-*''^^ ||pg are uniformly 
bounded in t and r for some pf,Pg > 2, where we define qp '■= 1—2/pF, qc '■= ^—2/pg 
and qpG ■= QF A qa- 

(Hi) The multiplicative kernel K has finite first and p-th moment. 

We note that assumptions above are similar to Assumption 4.1. The difference lies 
in Assumption 4.1(i) and Assumption 4.2(i). Assumption 4.2(i) is in terms of moments 
whereas Assumption 4.1(i) is in terms of functions. 

In the following theorem we derive an error bound for the estimator ip. 

Theorem 4.3. Suppose the stationary time series {{Xt, Zt)} satisfies (4-1), o,nd is 2-mixing 
of size V. Furthermore, Assumption 4-3 is fulfilled for some qpG^ Q £ (0, 1)- 

Let the estimator (p{z) be defined as in (4-2), where K is a multiplicative kernel of order 
r. In addition assume that (p ■ f, / G ®f a f'^^ some A, s > 0, f is bounded away from zero 
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and let p = r f\ s. Then we have for all z 



oo. 



We now obtain the rates of convergence using the optimal bandwidth. 
Corollary 4.4. Suppose the assumptions in Theorem 4-3 are satisfied. Let b* ~ 7"~7/(2p+(i) 
with 



^' 2p+dil+,+gFGa-[('i^m)) ^ l+q/qFG>q^., 

2p 



then we have \(f{z) - = Op(t zp+d'T^ jor all z € M. 

5 Nonparametric panel time series 

In recent years, panel time series have often been used to model the relationship and dy- 
namics between several observed time series. Typically we let Xt^i denote the observation of 
the zth individual at time t, where i = 1, . . . , N and t = 1, . . . ,T. We also assume that we 
observe some explanatory variables Zt^i which are known to influence Xt^i. Several models 
have been proposed to model the complex relationship between individuals, ranging from 
parametric models (c.f Baltagi [2001], Hjellvik and Tj0stheim [1999], Dahlhaus and Feiler 
[2005], and the references therein) to nonparametric additive models (c.f. Mammen et al. 
[2005]). In this section, we take the nonparametric route, and use the methods developed 
in the sections above to obtain an estimator of the mean function and study its sampling 
properties. The results in this section can be used in various applications, an interesting 
example is the estimation of the covariance function of spatial-temporal models considered 
in Johannes et al. [2007]. 

Let us suppose the affect of the explanatory variables is common over all individuals. To 
be precise, the response and explanatory variables {{Xt^i, Zt^i)}t form a (1 -|- (i)-dimensional 
stationary vector time series which satisfies the relation 

E[Xt^i\Zt,i = z] = ip{z) yz eR'^,i en,t ez. (5.1) 

We describe the dependence of {{Xt^i, Zt^i)}, by assuming it is 2-mixing over time, see 
Definition 5.1, below. We note that the model considered in Hjellvik et al. [2004] and 
Mammen et al. [2005], can be used as a particular example of (5.1). 

We now define an estimator for ip. Note that we do not suppose that different individuals, 
say {Xt^i,Zt^i) and {Xt,j,Ztj) are identically distributed (have common densities). Let 
fi{x,z) denote the joint density of the random vector {Xt^i,Zt^i) for i G N. Moreover, let 
fi{z) denote the marginal density of Zt^i. Using these densities we can rewrite (5.1) as 

ifiz) = E[Xt,i\Zt,, = z]= [ x^^^dx =: Vz G M^t G Z,z G N. 
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Furthermore, using the above, it is easily verified that 
a-(z) 

^(z) = ^^'='^'^ VzGM^TVeN (5.2) 

which motivates the following estimator of (p. 

Given the observations {{Xt^i, Zt^i); t = 1, . . . ,T;i = 1, . . . , N} our object is to estimate 
ip and consider its sampling properties. The identity (5.2) suggests as an estimator of v'(-z) 



N 2^1=1 9i\ 



(p{^z) = ^^-=^^^^^ (5.3) 
using for each i = 1 , . . . , 

T T 

5i(^):=^^Xt,,K;,,(Zt,,-z) and /,(z) - 1 ^ /^,^(Zt,, - z) (5.4) 
t=\ t=i 

as estimators of gi{z) and fi{z), respectively. 

We quantify the dependence both over time and between individuals through their 2- 
mixing rates. 

Definition 5.1. The panel time series {{Xt^i, Zt^i)}t, i G N, is said to be 2-mixing with 
size and u, if for all i,j G N 

sup \P{AnB)- P{A)P{B)\<c\ ^-^ * = 

Aef7{Xt,„Zt,,),BGf7(x,,,,z,j) [ l*-'^! otherwise, 

for some C < oo independent ofi, j, t and r. 

In the results below we will show that the rate of convergence of ip{-) is determined 
by the smallest mixing size (d A u). However in the case that d < u and the number of 
individuals grow with T, the rate is determined, solely, by u. 

Remark 5.1. We note that in Definition 5.1 we have two 2-mixing sizes, the size d de- 
scribes the dependence of the time series {{Xt^i, Zt^i)}t, whereas the size u describes the 
dependence between individuals over time. By separating these two sizes we can model 
different behaviours. A simple example is Xt^i = ^{Zi) + et^i, where for a given individual 
i, the explanatory variable Zi is fixed over time, {£t,i} and {Zi} are iid random variables. 
In this example, d = and u = oo. □ 

To obtain the sampling properties of (p we require the following assumptions, which are 
an extension of Assumption 4.2 to panel data. 

Assumption 5.1 (Densities, moments and kernels). (i) For all i £ N let E[\Xt^i\^] < oo 
for some p > 2 and define gf\z) := K[Xf^\Zt^i = z] • fi{z). Then the functions g^f'' 
and fi, for all i G N, are uniformly hounded and we define q := 1 — 2/p. 
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(ii) For each t,T Tj and i,j G N let ff'-'^'^ denote the joint density of {Zt^i,Zrj) and 
let 9'ff\zi,Z2) := E[Xt,iXr,J\Zt,^ = zi,Zrj = Z2] • 4^^(^1,22). Define F^'f := 
^tf^ - /i ® fj and Cff^ := gff^ - gi gj- Then \\F^-f^\\pp and \\Gff^\\p^ are 
uniformly bounded in i,j,t and r for some Pf,Pg > 2. We define qp = ^ — 2/pF, 
qc = 1- 2/pG and qpG ■= QF A qa- 

(Hi) The multiplicative kernel K has a finite first and p-th moment. 

We now obtain a bound for the deviation of (p{z), as N is kept fixed and T 00. 

Theorem 5.1. Let us suppose that the stationary panel time series {{Xt^i, Zt^i)} satisfies 
(5.1), for all i,j E N, and is 2-mixing with size and u (as defined in Definition 5.1). 
Suppose Assumption 5.1 is fulfilled for some q, qpc £ (0, 1). 

Let the estimator (p he defined in (5.3), where for each i = 1, . . . , N the nonparametric 
estimators cji and fi given in (5.4) are constructed using a multiplicative kernel K of order 
r > 0. In addition assume for each i = 1, . . . ,N, that the functions ip ■ fi and fi belong to 
®s A f'^''" Si, A > 0, fi is bounded away from zero and let pi = r A Sj. Then we have for all 

zeR'^ 



N 

-d{l+q+qFG-qFG[{qu)Vl]) 

i=l ^ 



i=l 

+ N^'^ ■ T^K?")^!] . 57^(i+9+9FG-9FG[(go)vi])|^ ^ T ^ 00. (5.5) 

Comparing Theorem 4.3 with the theorem above, we see, besides the summation X]£n 
the addition of an extra term r-K'?")^^ • ij-d(^+i+iFG-qFG[{qn)vi]) ^-^^ dependence be- 

tween individuals over time. Altogether this implies that the bound can be partitioned into 
nine different cases, depending on the values of u and D (compare this with the bound in 
Theorem 4.3, which can be partitioned into three cases). However, the nine different bounds 
can be grouped into two main cases; when u < D and u > 0, we consider these two cases in 
the corollaries below. 

If u < D, we notice that the third term dominates the fourth term, in other words there 
is a larger dependence between individuals over time than for each individual over time. 
We consider this case below. 

Corollary 5.2. Suppose the assumptions in Theorem 5.1 are satisfied and u < d. For 
each i = l,...,N, let b* ^ j^~-H/(2p,+d) ^^^^ 




2pi+d 

2p,+d{l+q+qFG{i-[{qu)Vl])) ' 



<lU>l + q/qFc; 

1 + q/qpG > qu, 



then for all z eR'^ we have \0{z) - ^{z)\'^ = C'p(^E*=i^ w'^'^ T ^ 00. 

Studying the corollary above we see that the rate of convergence is determined by u. 
In other words, there is no benefit in the estimation by including several individuals N. 
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Furthermore we see that the 'usual' nonparametric rate of convergence is only achieved if 
u > l/g + Ifg- 

Let us now consider the situation where d < u, that is there is less dependence between 
two different individuals over time than one individual observed over time (see Remark 5.1 
for an example). This scenario is more likely to arise in real applications. In this case, the 
fourth term dominates the third term in (5.5), which suggests that increasing the number of 
individuals does yield a faster rate of convergence. We notice that the usual nonparametric 
rate of convergence can only be obtained if u > 1/ qpG + l/^- 

Corollary 5.3. Suppose the assumptions of Theorem 5.1 are satisfied and D < u. For 
each i = l,...,N, let b* ^ jM-0/i2p,+d) . j^-5J(2p,+d) ^j^^^^ ^ . > q 

(5 ■= / ^' ^ l + q/qPG] ,^ 

I [9"Al] 2p,+.(l+,S;"y(l-bovi])) - l + q/qFG>qt>. 
Then for all z G M"^ w;e have 

N 



i=l 



iV-^^^-t^^")"^]]}), r^oo, (5.8) 



where ji is defined in (5.6). 

Studying the corollary above, we see if is kept fixed, then the terms inside the inner 
bracket of (5.8) are of order Op(l) , therefore the rate of convergence is Op ^-^ Yl^=i ^ 
Thus, combining Corollaries 5.2 and 5.3 we have for arbitrary d and u the rate 
Op^jf X^il=i ^ [7iA5i]y Therefore, we obtain the usual nonparametric rate if (d Au) > 

Altogether the corollaries above imply that the rate of convergence depends on the 
slowest mixing rate, within or between the individuals. Let us suppose that d < u, if we 
closely examine (5.8) we see if Q is chosen such that Q < 7i[(^u) A 1)], 7j < 5i[{qu) A 1] 
(noting that the former inequality implies the later, since d < u) and (5i — 7j) > (which 
is the case when d < u), then the terms inside the inner bracket of (5.8) become small for 
large A^. This means that increasing the number of individuals leads to a faster rate of 
convergence. We show in the corollary below if we allow the number of individuals to 
grow as T grows, then the rate of convergence will depend only on u and no longer on the 
smaller d (unlike the case that N is fixed). 

Corollary 5.4. Suppose the assumptions in Theorem 5.1 are satisfied and let d < u. 
Furthermore assume there exists a Q > such that for each i G N, N^^ ~ j'(7i-<5i)^ where 
7i and 6i are defined in (5.6) and (5.7), respectively and 5i/[{qX)) A 1] > Qi. Then given 

^ 2p 

|(^(z)-(^(z)|2 = Op(-5^r-^-^^), r^oo. (5.9) 

1=1 
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We see from the corollary above if the number of individuals, A^, grows at the rate 
N ^ T '^i , where cannot be too large, in particular 6i[{qt>) A 1] > Q, then the rate of 
convergence depends only on the mixing size u (compare this with Corollaries 5.2 and 5.3, 
where the mixing size depends on (o Au)). Furthermore if qu > q/qpG + 1 then we have the 
usual nonparametric rate \ip{z) — <p{z)\'^ = Op(^j^ SiLi ^ 2pi+d^_ jj^ l^j^g special case that 
ip and all the marginal densities fi belong to the same smoothness class, that is for all i, 
p = Pi, then (5.9) simplifies to \(p{z) - ^p{z)\'^ = Op(^r"2^y 

Remark 5.2. It is worth mentioning that similar results to those in Theorem 5.1 and 
Corollaries 5.2, 5.3 and 5.4 can be obtained for the model (4.3) considered in Section 4. □ 

6 Discussion 

In this paper we have considered nonparametric estimation for dependent data. Focusing 
on the case that the observations are nonlinear and highly dependent. We have obtained 
bounds for the kernel density estimator and also rates of convergence of two types of non- 
parametric regression models, both using the 2-mixing dependence measure. We show that 
when the assumption of linearity is relaxed, the rate of convergence does not necessarily 
depend on the autocovariance function of the observations. We demonstrate that 2-mixing 
is a natural measure of dependence for panel data and obtained rates of convergence for the 
common mean function in panel time series. 

As we are working under relatively weak conditions, we do not claim that the bounds 
obtained are minimax. However, the bounds can be considered as the worst case scenario 
for the nonparametric estimator. In future work, it would be of interest to investigate if the 
bounds in the paper are indeed close to minimax for certain nonlinear time series. In this 
paper we have derived bounds for the estimator using the optimal bandwidth. However the 
optimal bandwidth is constructed under the assumption that the 2-mixing size is known. It 
would also be on interest to develop bandwidth selection methods when the 2-mixing size 
of the observations is unknown. 
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A Appendix: Proofs 

A.l Proofs: Nonparametric density estimation 

We now prove the results in Section 3. We mention that Theorem 3.2 is stated for an 
univariate time series {Zt}, however the proofs of the results in Sections 4 and 5 require 
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results in the multivariate case. Therefore to save space, we give the proof of Theorem 3.2 
for a d-dimensional vector time series {Zt}. 

Lemma A.l. Suppose the time series {Zt} is 2-mixing with size d and Assumption 3.1 is 
fulfilled for some qp G (0, 1). If 1 < t,T < T , then ^ 

|co^;{i^b(Zt-z),i^5(Z,-z)}| <min(6-'^(i-'?^);6-2'^|t-T|"''). (A.l) 

Proof. Writing the covariance as an integral, and using the notation in Assumption 3.1 
(ii) we have 

{Ki,{Zt - z),Kb{Zr -z)}= [ Kb{u - z)Kb{v - z)F^*'^\u,v)dudv. 



GOV 

Now by using Holder's inequality with p~p^ + P~p^ = 1, and recalling that qp = ^ — '2^/pf^ it 
is clear that 

|cov{Kfe(Zi - z),K,{Z^ _ ^)} I < 2_ . ft^'^/P^llKllI^ . ||F*'-||p^ < 

Using Assumption 3.1 we have that ||-F*'^||p^ is uniformly bounded and by using Lyaponov's 
inequality HiTlIp^ < oo for all 1 < < 2. This gives us the first bound in (A.l). On the 
other hand, under Assumption 3.1 (i) the kernel K is uniformly bounded and therefore, 
using the 2-mixing property of {Zt} together with Hall and Heyde [1980], Theorem A. 5, we 
obtain 

\cow{Kb{Zt-z),Kb{Zr-z)}\<h-^'^-\t-T\-\ 

which gives the second bound in (A.l). □ 

Proof of Theorem 3.2. We mention that parts of the following proof are motivated 
by techniques used in Bosq [1998], where nonparametric smoothing was considered for 
univariate time series. Consider the standard variance bias decomposition 

E|/>) - /(z)|2 = var(/>)) + |E/>) - f[z)\\ (A.2) 

Under the stated assumptions we will derive the following two bounds, which give together 
the result of the theorem. The bias is bounded by 

|IE/>)-/(^)l'<&'^ (A.3) 
while for the variance we have 

var(/(z)) < • b-'^ + T-l"^!] • fe-'^Cs+^F-aFlfVi])^ (^_4) 



^We write A< B is there exists a positive constant c such that A 5C cB. 
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Proof of (A. 3). We can write 

T 

Ef{z) = ^Y.^(^^b{Zt -z))= I du f{u)Kb{u - z). 

+ — 1 >J 



t=i 



Since / € ®f ^ and i^' is a multiplicative kernel of order r with J du\u\'^ K{u) < Sk, using a 
Taylor expansion up to the order p = min(r, s) leads to ]E/(z) = /(z) + b^R with reminder 
\R\ < ASk < oo, which proves (A. 3). 

In order to proof (A. 4), we consider the expansion 



T 



var(/(z)) = ^ var {Kb{Zt - z)} + ^ ^ cov {Kt{Zt - z),K,{Zr - z)} 

t=l t>T 

=:Ai + A2. (A.5) 
We will show that \Ai\ < T'^ ■ and 

Furthermore, if < < l/qp + 1 then \Ai\ is dominated by \A2\. Whereas for D > l/qp + 1 
the terms \Ai\ and \A2\ are of the same order 0{T~^b~'^). Therefore, the bounds derived 
for 1 will lead to (A. 4). 

First let us consider Ai. Due to stationarity, we have the bound 



T ■ Ai<¥.[Kl[Zi- z)] = j du f{u)Kl{u- z). 



Since under the stated assumptions ||i^||2 < oo and the density / is uniformly bounded this 
leads to Ai < T'^ • b''^. 

The term T- \A2\ is bounded by the sum 4 Ylt=2 I'^o^ {Kb{Zt — z),Kb{Zi — z)} |. If < 1 
then we estimate the sum using the second bound in Lemma A.l, i.e., T- \A2\ ^ T^°^^b^'^'^, 
which is the first bound in (A. 6). On the other hand if d > 1 we partition the sum into two 
parts which we estimate separately using the bounds in Lemma A.l, thus giving us 

h T 

T-\A2\<[Y,b-^^^-'i^^ + h~^H-^]<[h-b-^'^^-'i^^ +h-''+^ ■b-^'^y 

t=2 t=h+l 

Thereby using h w 6"''''^ we obtain T ■ \A2\ < b~'^ + b-d-Ci+qF-qFv) ^ -^j^g second bound 
in (A. 6). Thus we have proved (A. 4). □ 

Proof of Corollary 3.3 Under the assumption on the bandwidth the result is obtained 
by balancing the terms in the bound given in Theorem 4.1. □ 
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A. 2 Proofs: Nonparametric regression 

We now prove the results in Section 4. 

Lemma A. 2. Suppose the stationary time series {Xt,Zt} satisfies (4-3), o-nd {Zt} is 2- 
mixing with size D and the autocovariances of the time series {r^t} have size u (see Definition 
3.1). Suppose Assumption 4.1 is fulfilled for some q,qG G (0, 1). If 1 < t,T < T, then 

\cov{XtKb{Zt - z),XrKh{Zr -z)}\< (A.7) 
\cov{^{Zt)Ki,{Zt-z),^iZr)KbiZr-z)}\<b-''^'+'i^\t-T\-'^'', (A.8) 
\cov{h{Zt)Kb{Zt - z)7]t,h{Zr)Kb{Zr " z)r]r} \ < b~''\t - r|-". (A.9) 

Proof. Using the notation in Assumption 4.1 together with Holder's inequality, and re- 
calling that qc = i — '^/pg with p^^ + = 1, we have 

\coY {XtKbiZt - z),XrKt,{Zr -z)}\< 6-'^(i-«g)^ 

where we use that under Assumption 4.1, K has finite 1 < pc < p moment (by Lyaponovs 
inequality) and HCt^T-HpG is uniformly bounded. This gives us (A.7). 

We now prove (A.8). Under Assumption 4.1 the function ■ f is uniformly bounded 
and \\K\\p is finite for some p = 2/{l-q) >2, therefore we have [E\ip{Zi)Kb{Zi - z)\p]'^/p < 
l)-d.{q+i) _ Uging the 2-mixing property of {Zt} together with Hall and Heyde [1980], Theorem 
A. 6, we obtain (A.8). 

We now prove (A.9). The series {Zt} and {rjt} are independent, therefore expanding 
the term A := cov {h{Zt) Kb{Zt — z)7]t, h{ZT-)Kb{Zr — z)r]T-} gives 

A = cov{rit,Vr)-nh{Zt)Kb{Zt - z)h{Z,)Kb{Zr - z)]. 

Since the covariance of the time series {774} has size u, applying the Cauchy-Schwarz in- 
equality gives 

|^|<|t-rr"-E|M^i)A'6(Zi-z)|2. 

Under Assumption 4.1 the function • / is uniformly bounded and ||i^||2 < 00, therefore 
¥.\h{Zi)Kb{Zi - z)|2 < h-'^, and hence we obtain (A.9). □ 

Lemma A. 3. Suppose the stationary time series {Zt} is 2-mixing with size D and Assump- 
tion 4.1 is fulfilled for some q, qp G (0, 1). If 1 < t,T < T, then 

\cov{Ki,{Zt - z),Kb{Zr -z)}\< min(6-'^(i-'?^); 6-^(^+<?)|t - rj-^"). (A.IO) 
Proof. The proof is very similar to the proof of Lemma A. 2 and we omit the details. □ 
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Lemma A. 4. Suppose the assumptions in Theorem 4-1 are satisfied. Let g be defined as in 
(4-2)- Then we have 

^g{z) - g{z)\ < l?P + h-'^T-^ + 5-<i(i+«+%(i-to«')vi]))2--[{9o)Ai] ^ ^~drj.-{uAi) _ ^^ -^-^-^ 



Proof. Consider the standard variance bias decomposition 

K\g{z) - g{z)f = vai{g{z)) + \Eg{z) - g{z)\^ (A.12) 

Under the stated assumptions we will derive the following two bounds, which altogether 
give the estimate in (A. 11). The bias is bounded by 

\m^)-9iz)\'<b''', (A.13) 
while for the variance we have 

var(5(z)) < b^'^T'^ + 5-'^(i+'?+3g(i-[(5o)vi]))2.-[(<?d)ai] _^ ^-dj,-{nM) ^ (A.14) 

We first prove (A.13). We can write 

T 

1 t 

= ^ 5^IE(lE[Xt|Zt]Kfe(Zt -z))= / du g{u)Kh{u - z). 
t=i •' 

Since g G and K \s a. multiplicative kernel of order r with J du\u\'^ K{u) < Sk, using a 
Taylor expansion up to the order p = min(r, s) leads to Eg{z) = g{z) + b^R with reminder 
\R\ < ASk < oo, which proves (A.13). 

In order to proof (A.14), we consider the expansion 

T 

var(ff(z)) = ^ J] var {XtKbiZt - z)} + i^tKbiZt - z),X^Kk{Z^ - z)] 

t=l t>T 

=:Ai + A2. (A.15) 
We win show that < T'^ ■ b''^ + T-("^i) • b'"^ and 

J T-?" • 6-^^(1+5) + T-("^i) • fe-'^, go < 1; 

I ~ I . . ^-d(l+g+gG(l-<?t))) _^2^-{uAl) . 1 < gD. ^ '^^^ 

Furthermore, if < < 5/(7^ + 1 then we show that l^il is dominated by 1^21- Whereas 
for qv > q/qc + l the terms |j4i| and \A2\ are of the same order 0{T~^ ■ b~'^ + T~^^^^^ ■ b~^). 
Therefore, the bounds derived for \A2\ will lead to the estimates in (A.14). 

First let us consider Ai. Due to stationarity of the process, we have the bound 

T • < E|XiK,(Zi - z)f < E\ip{Zi)K,{Zi - z)\^ +E\h{Zi)Kh{Zi - z)\^ 

Under the stated assumptions the functions \ip\^ • / with p > 2 and • / are uniformly 
bounded and the kernel ||i^||2 < 00, therefore Ai < T^^ ■ b^"^. 
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Let us now consider the term A2, which is bound by 

T 



T ■ \A2\ < 4Y,\cov{XtKb{Zt - z),XiKh{Zi - z)}\, (A.17) 



t=2 



where using representation (4.3) the t-th summand in (A.17) can be estimated by 

\cov {ip{Zt)Kb{Zt - z),ip{Zi)Kb{Zi - z)} I 

+ \cov {h{Zt)Kb{Zt - z)r]t,h{Zi)Kb{Zi - z)r/i} |. (A.18) 

If gD < 1 and u < 1 then we bound the sum (A.17) using (A.18), i.e., 

T 

T ■ \A2\ < |cov W{Zt)Kb{Zt - z), ^{Zi)Kb{Zi - z)} \ 

T 

+ \coY {h{Zt)Kb{Zt - z)rit, h{Zi)Kb{Zi - z)r/i} |. (A.19) 



t=2 



t=2 



We use the bounds (A. 8) and (A. 9) in Lemma A. 2 to estimate each of the sums in (A.19) 
separately, which gives 

T ■ \A2\ < r-^^+i • 6-'^(i+'?) + r-"+^ • b-'^. (A.20) 

On the other hand if > 1 or if u > 1 we partition the sum (A.17) into two parts, where 
we estimate the first part using the bound (A. 7) in Lemma A. 2 and the second using (A.18), 
thus giving us 



T ■ \A2\ < h ■ b-''^^-'^^^ + Y \cov{ip{Zt)KbiZt-z),^iZi)KbiZi-z)}\ 

t=h+l 
T 

+ Y \cov{h{Zt)Kb{Zt-z)rit,h{Zi)Kb{Zi- z)r]i}\. (A.21) 

t=h+l 

We use the bounds (A. 8) and (A. 9) in Lemma A. 2 to estimate each of the sums in (A.21) 
separately, which gives 

{r-'f +1 . + /i-u+i . < 1 and u > 1; 

/j-go+i . fj-d(i+q) ^ rj^-u+i . ^-d^ > 1 and u < 1; 
^-qo+l . ^ f^-xi+l . j^~d^ > 1 and u > 1. 

(A.22) 

Thereby using h ~ &~'^'Jg obtain 



T-\A2\<b-''+i 



■ rj.-qv+l . _^ ^-d{l+qG{l-u)) ^ < ^ ^^^^ 

^-dii+g+gcd-gv)) j^-u+i . fj-d^ gd > 1 and u < 1; (A.23) 

^-d(l+g+gG(l-9B)) _|_ Jj-dil+qcil-u.)) ^ qV > 1 and U > 1. 
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and hence, combining (A. 20) and (A. 23) gives the bound (A. 16) for the term D 

We now state a shght variation of Theorem 3.2, where / can satisfy shghtly weaker 
conditions. We use this result to prove Theorem 4.1. 

Lemma A. 5. Suppose the stationary time series {Zt} is 2-mixing with size D and Assump- 
tion 4-1 is fulfilled for some q,qF G (0, 1). Let f be defined as in (3.1), where the multi- 
plicative kernel is of order r > 0. In addition assume, that the function f belongs to *5f ^ 
for s, A > and let p := min(r, s). Then we have 

E|/(z) - f{z)\ < h^P + b-'^T-^ + 5-rf(i+9+9F(i-[(go)vi]))^-[(gB)Ai]_ (^_24) 



Proof. Under the stated assumptions using Lemma A. 3 the proof is very similar to the 
proof of Lemma A. 4 and we omit the details. □ 

Proof of Theorem 4.1. Consider the decomposition 

-I \ I \ 9{z) f{z) 
f{z) f{z) 

^ g{z) - f{z)^{z) f{z) - f{z) g{z)-f{zMz) 

m f{z) f{z) 

We first note that Lemma A. 5 gives ¥i\f{z) — /(z)p = o(l), which implies that |/(2;)~^| 
is bounded in probability. Therefore the second term in the above expansion is of order 
op{{g{z) — f {z)ip{z)} / f (z)) , hence in the decomposition above the second term is negligible 
in comparison to the first term. Thereby bounding the first term of the decomposition we 
obtain the result. By using Lemma A. 4 and A. 5 and noting that qpc = Qf ^ 9G) we obtain 
Theorem 4.1. □ 

Proof of Corollary 4.2 Under the assumption on the bandwidth the result is obtained 
by balancing the terms in the bound given in Theorem 4.1. □ 

Lemma A. 6. Suppose the stationary vector time series {(Xt,Zt)} is 2-mixing with size D 
and Assumption ^.2 is fulfilled for some q, qc G (0, 1). If 1 < t,T < T, then 

\cov{XtKb{Zt - z),XrKbiZr - z)}\ <min(6-'^(i-''G), 6-^(1+'?) It-rl-""). (A.25) 



Proof. Under Assumption 4.2 (ii) the first bound in (A.25) follows from (A. 7) in Lemma A. 7, 
On the other hand, under Assumption 4.1 (i,iii) the function E[|Xi|^|Zi] • / is uniformly 
bounded and \\K\\p is finite for some p = 2/(1 — q) > 2, therefore we have [E|Xii^6(Zi — 
2;)|P]2/p < i)-diq+i) ^ Using the 2-mixing property of {Zt} together with Hall and Heyde 
[1980], Theorem A. 6, we obtain the second bound in (A.25). □ 
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Lemma A. 7. Suppose the stationary vector time series {{Xt,Zt)} is 2-mixing with size D 
and Assumption is fulfilled for some q, qc £ (0, 1). Let g be defined as in (4-2), where 
the multivariate kernel is of order r > 0. In addition assume, that the function g = <f ■ f 
belongs to ^ for s, A > and let p := min(r, s). Then we have 

E\g{z) - g{z)\ < b^P + b~'^T~^ + 5-'i(i+«+'/G(i-[(ao)vi]))^-tot,)Ai] (^_26) 

Proof. Under the stated assumptions using Lemma A. 6 the proof is very similar to the 
proof of Lemma A. 4 and we omit the details. □ 

Proof of Theorem 4.3. Using Lemma A. 5 and A. 5 we obtain the result using a similar 
proof as Theorem 4.1. □ 

Proof of Corollary 4.4 Under the assumption on the bandwidth the result is obtained 
by balancing the terms in the bound given in Theorem 4.1. □ 

A. 3 Proofs: Nonparametric regression for panel time series 

Lemma A. 8. Suppose {Xt^i} satisfies (5.1), for all i,j G N, {{Xt^i, Zt^i, Xtj, Ztj)} is a 
stationary time series, and the panel time series {{XtA, Zt^i)} is 2-mixing with size and 
u (as defined in Definition 5.1) and Assumption 5.1 is satisfied for some qcQ G (0)1)- U 
1 <t,T <T andl <i < j < N, then 

\cov{Xt^iKb^{Zt^i - z),XrjKb^{Zrj - z)}\< 

min((6i6j)-i(i-^G);(6i6j)-i('?+i)|t-T|-«"), (A.27) 
while if 1 < t,T < T and 1 < i < N , then 

\cOv{XtAKh^{Zt^i - z),XrAKb-{Zr,i " ^;)} | < 

min (^r'^^l-^G) . ^-dig+l) | ^ _ ^| -gt,^ _ ^^ ^S) 

Proof. Using Assumption 5.1 together with Holder's inequality, and recalling that qc = 
1 — 2/pg with Pq^ ~^ Pg^ = 1 s-iid IIG^^^^IIpg is uniformly bounded, we have 

|cov {Xt,iKbAZt,i - z),Xr,,Kb^{Zrj -z)}\<{bi- b^r^'/P^, 

where the bound is obtained by using Lyaponov's inequality, which gives < oo. This 

gives the common bound in (A.27) and (A.28). On the other hand, we have K[\Xt^iKi,.{Zt^i — 
z)\P] < oo with p = 2/(1 — q) > 2 (Assumption 5.1 (i)). Therefore, using the 2-mixing 
property of the panel time series {{Xt^i, Zt^i)} together with Hall and Heyde [1980], Theorem 
A. 6, for i 7^ j, we obtain 

|C0V {Xt^iKb^{Zt^i - z),XrjKb^{Zrj - z)] I 

< {E[\X,,Kb^{Z,, - z)\P] ■ E[\X,,,Kb^{Z,,j - z)\P]}'/P ■ \t - t|-5«, (A.29) 
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while for i = j 



\C0V {Xt-iKb^{Zt^i - z),Xr,iK\{Zr-i - z)} I 

< {E[\X,,K,^{Z,, - z)\P]}^/P ■ \t - rj-^". (A.30) 

Since under Assumption 5.1, the function g\^\-) = = is uniformly 

bounded and < oo we have 

Therefore, (A. 29) together with (A.31) gives the second bound in (A. 27), where (A.30) and 
(A.31) leads to the second bound in (A. 28), which proves the result. □ 

Lemma A. 9. Suppose {Xt^i} satisfies (5.1), for all i,j £ N, {{Xt,i, Zt^i, Xtj, Ztj)} is a 
stationary time series, and the panel time series {{Xt^i, Zt^i)} is 2-mixing with size and 
u (as defined in Definition 5.1) and Assumption 5.1 is satisfied for some qF,Q G (0, 1). // 
1 <t,T <T andl <i < j < N, then 

\cov{K,XZt,i- z),K,^{Zr,j - z)] I < min((6i6,)"i(i-'?^);(6i6,)-i(i+'?)|t-rr"), (A.32) 

while if 1 < t,T < T and 1 < i < N , then 

\cov{K,^{Zt,^ - z),K,^{Z^,, -z)}\< min(6r'^(i-'?^); - (A.33) 



Proof. The proof is very similar to the proof of Lemma A. 8 and we omit the details. □ 
We use the lemma below to prove Theorem 5.1, which requires the following definitions 

^ N ^ N ^ N ^ N 

9--=J^Y.9i^ 9--=^Y.9i^ f-^Y^f^ /•=]^E/- 

i=l i=l 1=1 1=1 

Lemma A. 10. Let us suppose that all assumptions in Theorem 5.1 hold. Then we have 

i=l 

and 

N 



E\f{z) - /(z)p < — ^jftj^''' + T""^ • b~'^ + T~[(''u)Ai] . ^-d(i+g+gF~gF[(gu)vi]) 

^ i=l 
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Proof. We only give the details for the proof of the MSE of the estimator g. The proof of 
the other result is very similar (but uses the bounds given in Lemma A. 9 rather than Lemma 
A. 8) and we omit the details. Consider the standard variance and bias decomposition 

ng{z) - g{z)\'' = var(^(z)) + \Wg{z) - g{z)\\ (A.36) 

Under the stated assumptions we will derive the following two bounds. The bias is bounded 
by 

1 ^ 

- 9{z)\^ < (A.37) 

4 = 1 

while for the variance we have 

N 



var(5(z)) < r-[(^")^i] • 1 Y.ihT'' + ^-d{i+,+m-,a[{,n)yi])^ 

i=l 

1 ^ 

+ (iVr[(«"')^^])"^ • — Yy^i'^ + 5-'^(1+'?+9G-5g[{50)V1])j_ ^^^gg^ 
^ i=l 

We now prove (A.37). Using iterative conditional expectation we can write 

= ^ j;E(E[Xi,,|Zt,,]A'fe,(Zt,,-z)) = j^Y. / dug,{u)K,^{u-z) 

A 1 -I- 1 1 ^ 



where gi = E[Xt^i\Zt^i = •]/i(-) for ah t,i. Since g = J2iLi9i with gi E (5^.^^ and K 
a multiplicative kernel of order r with J du\u\'^K(u) < Sk, using a Taylor expansion up to 
order pi = min(r, Si) leads to 'Eg{z) = g{z) + Xli^i ^i^-^i with reminder \Ri\ < ASk < oo. 
Thus applying Jensens inequality we obtain (A.37). 
In order to proof (A. 38) we consider the expansion 

var(5(z)) = ^1+^2 + ^3+^4 (A.39) 

with 

^ T N 

t=l i=l 
T 

^2 = ^ E E {Xt,^K,XZt, - z) , Xt,,K,^ {Zt, -z)} , 
t=i j>i 

^3 = 7^4^ E E ^^■^ {Xt,tKb^iZt,i - Z) , XrjKb^ (ZrJ -Z)} , 
t>T j>i 

2 ^ 

A4 = Y cov {Xt^iKb^{Zt,i - z),Xr,iKbSZr,i -z)}. 

t>T i=l 



IS 
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We will show that |^i|, l^aj < T"! • ^ ^^^^ b''^ 



1 ^ 



-d{l+q+qG-qG[{qu)Vl]), 



^ ■ ^ . and 

1 ^ 

1^4] < (ArrK?")^!])-! • — ^[ft"'^ + 67'^(^+9+^c;-'7G[(go)vi])j_ 

Furthermore, if < g(D A u) < q/qc + ^ then the terms l^il and \A2\ are dominated by 
|^3| + |^4|- Whereas for (7(0 A u) > 5/(7^ + 1 all the terms are of the same order. Therefore, 
the bound derived for |j43| + {A^l will lead to the estimate in (A. 38). 
First let us consider Ai. Due to stationarity, we obtain the bound 

1 ^ 



N 

i=l 



Thereby, under the assumption that the functions §1 (•) := E[|Xi^jp|Zi^j = •]/«(•) are 

N 



uniformly bounded and the kernel \\K\\2 < 00, we have Ai < {N ■ T) ^ ■ jf h 
It is straightforward to show that T • | yl2 1 is bounded by 

^ 5^ var(Xi,i^,,(Zi, - z)f'\^i{X,jK,^{Z,, - z)f'' < ^ Y.^h.h.Y''/^ 

j>i j>i 

where the inequality above follows by applying the same arguments as those used for Ai. 
Therefore using the Cauchy-Schwarz inequality we obtain \A2\ < T^^ • J2iLi K'^- 
The term T ■ {A^l is bounded by the sum 

8 ^ 

Jp^^l^^"^ {^t,iKb^ {Zt,i - z) , XijKi,^ {Zij -z)}\. 

j>i t=2 

We now derive bounds for T ■ {A^l for different mixing sizes. li qu < 1 then we esti- 
mate the inner sum using the second bound in (A. 27) of Lemma A. 8, i.e., T ■ {A^l < 
7^2 X]j>i(^i^j)~^'''^^^^-^~''"^^' which leads together with the Cauchy-Schwarz inequality to 
l^sl ^ X^iLi On the other hand, ii qu > 1 then we partition the inner 

sum into two parts which we estimate separately using now the two bounds in (A. 27) of 
Lemma A. 8, thus giving us 



j>i t=2 



Thereby using hij ~ {bibj) ^ig together with the Cauchy-Schwarz inequality we obtain 
T ■ l^sl < jf E^=i[K'^ + Combining the bounds in the two cases qu < 1 

and gu > 1 we have l^al < T-Ki^^^^^ ■ ^ ZtiiK'' + i-'^'+^+'^^-'^^t^'^")^'])]. 
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The term N ■ T ■ | 1 is bounded by 

'^'^ \cov {Xt^iKi,,^{Zt^i - z),Xi^iKi,,^{Zi^i - z)} 



N 

1=1 t=2 

We estimate the inner sum applying the same arguments as those used for | \ (but use the 
bounds given in (A. 28) rather than (A. 27) of Lemma A. 8). Thereby we obtain • |j44| < 

Proof of Theorem 5.1. The proof is very similar to the proof of Theorem 4.1 (but uses 
Lemma A. 10 rather than Lemma A. 4 and A. 5) and we omit the details. □ 

Proof of Corollary 5.2. Under the assumptions of the corollary we have 

1 ^ 

|^(z)-V?(z)|2 = Op(-^{62'''+T-i-6-^+T"[(5«)^i]-67'^(i+''+''^G-qfG[Mvi])|^|^ 

i=l 

applying Theorem 5.1, where ^ ^-i .T^[(q'o)M] .^~d(i+q+qpa~qpG[{<i'o)yi]) asymptoti- 

cally negligible when D > u. Under the assumption on the bandwidths the result is obtained 
balancing the three terms in the bound (A. 40). □ 

Proof of Corollary 5.3. The result is obtained applying Theorem 5.1, where given 
< u the assumption on the bandwidths provides the balance of the four terms of the 
bound (5.5). □ 

Proof of Corollary 5.4. Under the assumptions of the corollary we apply Corollary 5.3, 
where given A''''' ~ J'(7i-5i) fgj. ggme Ci > the bound (5.8) simplifies to 

N 
i=l 

Since D < u for each i G N impHes = 0(1). We obtain the result, if 

for each i G N we have T '^i ' = 0(1) or equivalently 6i/[{q\)) A 1] > Ci, which 

is just the condition given in Corollary 5.4. □ 

A. 4 Covariances and 2-mixing rates for linear processes 

We use the results derived in this section in Section 3.1, where we compared the rates of 
convergence for linear processes with the rates in the general 2-mixing case. 

Let us suppose {Zt} satisfies the linear process representation in (3.6). By placing some 
additional conditions on the innovations we have the following lemma, which is due to 
Giraitis et al. [1996], Lemma 1 and 2. 
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Lemma A. 11 (Giraitis et al. [1996]). Suppose {Zt} is a linear process which satisfies (3.6), 
and cov{Zq, Zt) < Ct^^ Let f be the density of Zt and ft denote the joint density ZQ,Zt. If 
E(|ef I) < OG, and for a//n G M suppose the characteristic function satisfies |E[exp(— inei)] | < 
(i+l'^D'i /^'^ some 6 > 0, then the joint density satisfies the relation 

ftix,y) = f[x)f[y)+r[t)f\x)f\y)+0(t-'-''), 

where f G -^^i(M) and r{t) = cov{Zo, Zt) , for some < d < min(|, ^j^)- 

Using the result above the MSE of the kernel estimator with observations from a linear 
process can be derived. 

For most processes, there isn't a direct correspondence between the 2-mixing and the 
covariance size. However for Gaussian processes both sizes are linked by the inequality 

'""'tyt" ^ |P(AnB)-P(^)P(B)|<2.!;^%M (A.41) 

var(Xo) AecT{Zo),Bea{Zt) var(Xo) 

(see Doukhan [1994], Section 2.1), thus the covariance and the 2-mixing sizes are the same. 
Suppose that {Zt} satisfies (3.6), where the innovations are Gaussian and \aj\ < j^^. Then 
we have 

|cov(Xo,Xf)| ^ 

and <( ' ifV2<^<l; (^42) 

sup \PiAnB)-P{A)P{B)\ |~l * if6'>l. 

Aea{Zo)Bea(Zt) ) 

We now consider more general linear processes, which are not necessarily Gaussian. 
Then the covariance size does not immediately give the 2-mixing size. However, if the 
density of the innovations satisfies certain smoothness conditions then we can obtain the 
following bound. 

Lemma A. 12. Suppose {Zt} is a linear process which satisfies the representation Zt = 
Yl'jLo'^j^t-j' where the parameters \aj\ < Cj^^ and 6 > 1/2. Let fg be the density of the 
innovation et- //E(|et|^) < oo (where £ > 2) and f \ fe{x + a) — f£{x)\dx < C\a\, then we 
have 

sup \P{AnB)- P{A)P{B)\ < C7j(-2^+i)2(iTT) 

Aea{Zo),B<^a{Zt) 

where C are some arbitrary constants. 

PROOF. The result can be proved using a straightforward adaptation of the proofs in 
Chanda [1974], Gorodetskii [1977] and Davidson [1994] (Theorem 14.9), who proved the 
result for strong a-mixing. Hence we omit the details. □ 

Remark A.l. It is interesting to compare the 2-mixing sizes derived in Lemma A. 12 with 
the strong a-mixing results for MA(oo) processes. Under the same set of conditions, but 
with the additional restriction that 6 > 3/2, we have that 

sup \P[Af^B) - P[A)P[B)\ < 

Ae<7(Zo,Z-i,...),Bea(Zt,Zt+i,...) 
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In other words, the 2-mixing size is larger than the a-mixing size. This is because, by 
definition, the cr-algebras involved in the definition of a-mixing is far larger than the a- 
algebras in the definition of 2-mixing, thus allowing more extreme cases. □ 

Comparing Lemma A. 12 with the covariance size given in (A. 42) we see when the Gaus- 
sianity assumption is relaxed the covariance and 2-mixing sizes no longer coincide. However 
by using Lemma A. 12 and Hall and Heyde [1980], Theorem A. 5, we have the upper and 
lower bounds 

Aea{Zo),BecT{Zt) 

Therefore the 2-mixing size D of the linear process {Zt} is bounded by 
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